OPTIMAL PREDICTION FOR RADIATIVE TRANSFER: A NEW 
PERSPECTIVE ON MOMENT CLOSURE 



MARTIN FRANK AND BENJAMIN SEIBOLD 



Abstract. A direct numerical solution of the radiative transfer equation is typically expensive, 
since the radiative intensity depends on time, space and direction. An expansion in the direction 
• variables yields an equivalent system of infinitely many moments. A fundamental problem is how 

to truncate the system. Various closures have been presented in the literature. We formulate 
the method of optimal prediction, a strategy to approximate the mean solution of a large system 
Q by a smaller system, for radiation moment systems. To that end, the formalism is generalized to 

systems of partial differential equations. Using Gaussian measures, we re-derive linear closures, 
such as Pjy, diffusion, and diffusion correction closures. In addition, we propose a modification to 
existing closures. Although simple and with no extra cost, the newly derived crescendo-diffusion 
-h yields significantly better approximations in ID and 2D tests. 

-C 1. Introduction 

-i— > 

a 

In many fields, macroscopic equations can be derived from mesoscopic kinetic equations. For 
instance, in the Navier-Stokes and Euler equations, the macroscopic fluid variables, e.g. density 
and momentum, are moments of the phase space distribution of the Boltzmann equation. Simi- 
larly, in the equations of radiative transfer [35] , the direction dependent kinetic equations can be 
f***. transformed into a coupled system for the infinitely many moments. This can be interpreted as 

considering an (infinite) expansion in the kinetic variable and then taking only finitely many mem- 
bers of this expansion. This usually means that several "coordinates" of the field are neglected. 

\^ A common feature of existing closure strategies is that they are based on truncating and approx- 

imating the moment equations, and observing to which extent the solution of the approximate 
system is close to the true solution. The approximations are supported by physical arguments, 
such as higher moments being small or adjusting instantaneously. In Sect. [2] we derive a moment 
model for radiative transfer, and outline some classical linear closure strategies. 

A more potent strategy would be to have an identity for the evolution of the first N moments, 
and then deriving closures by approximating this identity. In this paper, we take this approach. 



We show that the method of optimal prediction [T4l [TTJ |T3j [9J [10] can be applied to the equations 
of radiative transfer and yields closed systems of finitely many moments. Optimal prediction, 
outlined in Sect. [3j approximates the mean solution of a large system by a smaller system, by 
averaging the equations with respect to an underlying probability measure. It can be understood 
as removing undesired modes, but in an averaged fashion, instead of merely neglecting them. 

Optimal prediction has been formulated for Hamiltonian partial differential equations 14, 15 , 
however, without exploiting the full formalism. It has been applied to partial differential equa- 
tions |16l however, only after reducing them to a system of ordinary differential equations 
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using a Fourier expansion or a semi-discretization step. In addition, most considered examples 
are Hamiltonian, for which a canonical measure exists. In contrast, here we encounter partial 
differential equations (in space) after a Fourier expansion (in the angular variable). In addition, 
no canonical measure exists for the system. Hence, the methodology has to be generalized to semi- 
groups. In Sect. [4] we define measures on function spaces. Furthermore, the radiation system is 
linear. Using Gaussian measures, the formalism is linear, and it yields an identity in the presence 
of a memory kernel. Here we restrict ourselves to Gaussian measures since they, as we will see, can 
generate all possible linear closures. In Sect. [5j we present linear optimal prediction in function 
spaces. 

The application to a simple model system (Sect. [6]) enlightens the procedure and various approx- 
imations. In Sect. [7] we apply linear optimal prediction to the radiation moment system, and 
derive existing and propose new closure relations. While the new formalism does not remove the 
arbitrariness in the closure procedure, it introduces it in a rational and comprehensible manner 
through the choice of a measure and the approximation of an integral. Thus, it allows a better 
understanding of the errors done due to the truncation of the infinite system. The performance of 
the newly derived closures is investigated numerically in Sect. [8] in one and two dimensional test 
problems. 



2. Moment Models for Radiative Transfer 

The equations of radiative transfer |32j are 

-d t I(x,n,t) + nVI(x,n,t) + (a(x) + K(x))I(x,Sl,t) = ^p- [ I(x,Q',t)dn' + q(x,t) . (1) 

In these equations, the radiative intensity I(x, f2, t) can be viewed as the the number of particles at 
time t, position x, traveling into direction f2. Equation ([I]) is a mesoscopic phase space equation, 
modeling absorption and emission (K-term), scattering (er-term), and containing a source term q. 
Due to the large number of unknowns, a direct numerical simulation of is very costly. Often 
times only the lowest moments of the intensity with respect to the direction are of interest. 
Moment models attempt to approximate by a coupled system of moments. 

For the sake of notational simplicity, we consider a slab geometry. However, all methods presented 
here can easily be generalized. Consider a plate that is finite along the x-axis and infinite in the 
y and z directions. The system is assumed to be invariant under translations in y and z and in 
rotations around the x-axis. In this case the radiative intensity can only depend on the scalar 
variable x and on the azimuthal angle 6 — arccos(/i) between the x-axis and the direction of flight. 
Furthermore, we select units such that c = 1 . The system becomes 

cr(x) f 1 

d t I(x, n, t) + fid x I(x, fi, t) + (cr(x) + k(x))I(x, fi, t) = J I(x, n', t) d// + q(x, t) (2) 

with t > 0, x £]a, b[, and /1 G [—1,1]. The system is supplied with boundary conditions that either 
prescribe ingoing characteristics or are periodic boundary conditions 

I{a,^,t)\n>o = \I(a,fi,t)=I(b,(i,t) 
/ ( 6 ,M, i )U<o = \lx(a,fJ-,t) = I x (b,(i,t) 

and initial conditions 

I(x,li,0) = I{x,n). 

Under very general assumptions, this problem admits a unique solution I £ C([0, r]; L 2 ([a, b] x 
[— 1, 1])), i.e. at each time / is square-integrable in both spatial and angular variable [18 . 
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2.1. Moment Models. Macroscopic approximations can be derived using angular basis func- 
tions. Commonly used are Legendre polynomials |19l IB], since they form an orthogonal basis on 
[— 1, 1]. Define the moments 



Il(x,t) 



I(x, (i,t)Pt(fjt) d/u. 



Multiplying (2) with P k and integrating over fi gives 



d t I k (x,t) + d x J fiP k (ti)I(x,[i,t)dfi+ (a{x) + K,(x))I k (x,t) = (a(x)I (x,t) + 2q(x,t))S k ,o- 

Using the recursion relation for the Legendre polynomials leads to 

\{2q-I ), k = 
K -(K + a)I k , k>0 



dJk + d x y~] huh 



where b 



ki 



_ fc+i 



2k+i°k+i,l -r 2k+l 0k 



5 k -i,i- This is an infinite tridiagonal system for k = 0,1,2, . 



dtlk + bk,k-id x Ik-i + bk,k+id x Ik+i — —Ckl k + Ik > 
of first-order partial differential equations. Using the (infinite) matrix notation 



(3) 



B 



1 Q 2 

3 " 3 

I o 

3 
7 



ft 



\ 



and q = 



7 



(2nq\ 




(4) 



V 

we can write ([3| as 

d t I + B • d x I = -C • I + q . 

The infinite moment system ([3| is equivalent to the transfer equation Q , since we represented an 
L 2 function in terms of its Fourier components. In order to admit a numerical computation, (|3| 
has to be approximated by a system of finitely many moments Jo, . . . , Ini i- e - all modes Ii with 
I > N are not considered. Mathematically, it is evident that such a truncation approximates the 
full system, since the Ii decay faster than I , due to 



l|/(av,*)l 12 



In order to obtain a closed system, in the equation for In, the dependence on ijv+i has to be 
eliminated. A question of fundamental interest is how to close the moment system, i.e. by what 
to replace the dependence on In+i- In the following, we focus on two types of linear closure 
approaches: the P/v closure and diffusion approximations. 



2.2. Pm closure. The simplest closure, the so-called Pn closure [7] is to truncate the sequence 
i.e. = for / > N. The physical argument is that if the system is close to equilibrium, then the 
underlying particle distribution is uniquely determined by the lowest-order moments. This can be 
justified rigorously by an asymptotic analysis of Boltzmann's equation [B]. 



2.3. Diffusion closures. The classical diffusion closure is defined for N = 1. We assume Ji to 
be quasi-stationary and neglect Ii for I > 1, thus the equations read 

dJo + d x h = -kIq + qo 
ld x I = -(k + ct)Ii . 

Solving the second equation for 1\ and inserting it into the first equation yields the diffusion 
approximation 

dJo ~ d x 3(g + B) d x I = -kI + q . (5) 
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A new hierarchy of Pn approximations, denoted diffusion correction or modified diffusion closure, 
has recently been proposed by Levermore |31] . In slab geometry, it can be derived in the following 
way: We assume that Ii — for I > N + 1. Contrary to Pn, the (N + l)-st moment is assumed 
to be quasi-stationary. Setting d t lN+i = yields the algebraic relation 

which, substituted into the equation for In, yields an additional diffusion term for the last moment: 

* T . I. IT AO T J«(2ff-/0), ^ = ... 

[-(«; + cr)J^, iV > 

where = &jv,jv+i 2A+3 = ^b? (2Ar+4)~(2JV+3) ' ^ or N = this closure becomes the classical 
diffusion closure ([5]). 

2.4. Other types of closures. Other higher order diffusion approximations exist, such as the so- 
called simplified Pn (SPn) equations. These have been derived in an ad hoc fashion an( ^ 
have subsequently been substantiated via asymptotic analysis [29 and via a variational approach 
|39l H]. Various nonlinear approximations exist in the literature, most prominently flux-limited 
diffusion (5U] and minimum entropy methods [531 HI \M I2T1 12"3"1 HUl |2"2] . 

3. Optimal Prediction 

Optimal prediction, as introduced by Chorin, Hald, Kast, Kupfcrman et al. O [TTJ [T3J |SJ HO] seeks 
the mean solution of a time-dependent system, when only part of the initial data is known. It is 
assumed that a measure on the phase space is available. Fundamental interest lies in nonlinear 
systems, for which the mean solution decays to a thermodynamical equilibrium. Applications 
include molecular dynamics |37j and finance |35j . The formalism has been developed in detail 
[TTl [T2l ITU] for dynamical systems 

x(t) = R(x(t)) , x(0) = x . (7) 

Let the vector of unknowns be split x = (xp, Xf) into the resolved variables xp that are of interest, 
and the unresolved variables xp that should be "averaged out" Q Assume that the initial conditions 
xp for the resolved variables are known, while the initial conditions x^ for the unresolved variables 
are not known or not of relevance. In addition, let a probability measure /(x) be given. For 
Hamiltonian systems, whose dynamics is derived from a Hamiltonian function -ff(x), this could 
be the grand canonical distribution /(x) = Z^ 1 exp (— /3i7(x)), where /3 is inversely proportional 
to the temperature, and Z = J exp (— j3H{s.)) dx. 

Given the known initial conditions xp, the measure / induces a conditioned measure / xc (xi?) = 
Z _1 /(xc, x^) for the remaining unknowns, where Z — j /(x^, Xf) dx^. An average of a function 
■u(xc,x.f) with respect to /x c is the conditional expectation 

p TTTi r i i /4x c ,x f )/(x c ,x f )dx F 

Pu = Euxc = 7^77 r~, ■ (8) 

J ./(xc,XFjdx F 

It is an orthogonal projection with respect to the inner product (u,v) = E [uv], which is defined 
by the expectation E [u] — J J u(xc, xp)/(xc, xj?) dxc dx^. Let y?(x, t) denote the solution of 
([7]), for the initial conditions x = (xc,xj?). Then optimal prediction seeks for the mean solution 

P<p(x,t) = E[<p(xc,XF,t)\xc] ■ (9) 

A possible, but computationally expensive approach to approximate (|9| is by Monte-Carlo sam- 
pling, as presented in [T3]. Optimal prediction formulates a smaller system for x^. First order 

4n this paper, we borrow a subscript notation from the area of multigrid methods, in which C stands for 
"coarse" , and F stands for "fine" . In the following sections, this notation is also used for block matrices and block 
operators. 
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optimal prediction |16j constructs this system by applying the conditional expectation ([8| to the 
original equation's Q right hand side. For Hamiltonian systems, the arising system is again 
Hamiltonian [TUj . 

An approximate formula for the mean solution can be derived by applying the Mori-Zwanzig 
formalism |33l 141] in a version for conditional expectations [11 to the Liouvillc equation for 
(0. It yields an integro-differential equation, that involves the first order right hand side, plus 
a memory kernel. First order optimal prediction can be interpreted as the crude approximation 
of dropping the memory term. Various better approximations have been presented in |121 1131 19]. 
For nonlinear systems, optimal prediction remains an approximation, even if the memory term is 
computed exactly. 



4. Conditional Expectations for Gaussian Measures 

In this section, we want to extend the optimal prediction formalism to (partial) differential equa- 
tions. To this end, we have to construct a measure on a suitable infinite-dimensional function 
space, and an expression for its conditional expectation. Both can usually be achieved by con- 
sidering a suitable sequence of finite-dimensional measures [251 13] ■ I n addition, a measure can be 
directly defined by its characteristic functional. Formally, the characteristic functional is given by 
the measure's Fourier transform. Here we focus on Gaussian measures, because they are one of the 
few classes where the construction above is possible, and where explicit formulas can be derived. 
Thus, we present the formalism for conditional expectations first in finitely many dimensions, and 
then in function spaces. 

4.1. Gaussians in Finite Dimensions. Due to Bochner's theorem [3], a measure on R" is 
uniquely determined by its Fourier transform, or characteristic functional 

0(y)= f /(x)cxp( ly T x)dx. (10) 

JR™ 

Indeed, if 

(1) 0(0) = 1, 

(2) 9 is continuous on R™, and 

(3) 9 is positive definite in the sense that the matrix (9(yi — yj))i,j=i,...,jv is positive semi- 
definite for all N and all y, ; e R™, 

then there is unique probability measure A with density / on the c-algebra of Borel sets of R n 



such that (10) holds 



Let A £ R nxn be a symmetric positive definite matrix, and m € R n . Then 

/(x)= 7 ^= e X pf-^(x-m) T A- 1 (x-m) N ) (11) 
Vdet(27rA) \ 2 / 

is a probability density on R™. Introducing the inner product generated by A as 

(x,y)A := x T Ay , 



the Gaussian measure with density (11) has the characterstic functional 

0(y) = exp (~{y,y)A + «y T m 

Obviously, this functional satisfies the three conditions above. 

The conditional expectation of the Gaussian, given parts of the vector x, is given by 
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Lemma 1. Decompose the vectors x, m and the matrix A into 

and A = 

mj? 

Then the conditional expectation is 









m c 


X = 


, m = 








my 



E [x|xc] = 



E [x c |x c ] 
E[x F |x c ] 



/ x c /(xc,x F ) dx F 
/ /(xc,xf) dx F 

J x F /(x c ,x F ) dx F 
/ /(x c ,x F )dx F 



Acc Acf 
Afc Aff 



m F + A FC A cc (x c - m c ) 



(12) 



Proof. The identity E[xc|xc] = x c is trivial. To calculate E[xf|x c ], consider M = A 1 , with 
the same block matrix notation for M as for A. One can easily verify that 

(x - m) T A- 1 (x - m) = ||x F - b F ||M FF + (x c - m c ) T (M cc - M cf M^M re ) (x c - m c ) , 

where bF = mj? — M^Mfc( x c — m c) 5 and the norm is defined as ||xf||m ff — x^Mffxf- 
This yields for the conditional expectation 

/ (xf — bF + bF) exp (— |||xf 

T 

2 



E[x F |x c ] = 



h F\\M FF ) dXi 



Jexp(-i||xF -bF||M PF ) dx f 



= "F 



Since M = A 1 , the block matrices satisfy MfcAcc+MffAfc = 0, which implies -M f pMfc = 
AfcA cc , and thus proves the claim. □ 



Expression (12 1 coincides with the one given in [15] ■ The conditional expectation is a projection, 
that can be written in the form 



fx = E [x|x c ] = Ex + Fm , 



using the projection matrices 



E 



AfcA cc 



and F 





-A F cA cc 



(13) 



The orthogonal complement is then 

Qx=(I - P)x = Fx - Fm . 

One can easily verify that P 2 = P, Q 2 = Q, and PQ = QP = 0. If the measure is centered around 
the origin, i.e. m — 0, the projections become simple matrix multiplications P — E and Q = F. 

4.2. Gaussians in Function Spaces. The construction of measures on spaces of functions uses 
the characteristic functional. Formally, all expressions from the finite-dimensional case generalize 
to the infinite-dimensional case. There are some mathematical subtleties related to this construc- 
tion. For the interested reader, we collect these in this section. 

Following [28j [3], we construct measures on the dual S' of a Hilbert space S of functions. This 
construction is based on the Bochner-Minlos theorem. Its key assumption is that S is nuclear, i.e. 
the identity in S is of Hilbert-Schmidt type. In that case, the three conditions from above on a 
characteristic functional are necessary and sufficient for the existence of a corresponding measure. 
The proof uses a sequence of finite-dimensional measures and Bochner's theorem. The following 
construction of a nuclear Hilbert space serves our latter purposes. 

Definition 1. Let A be a positive definite infinite matrix such that 



< OO 
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and let V be a Hilbert space. We define the Hilbert space l\(V), consisting of an infinite vector of 
elements of V , by the inner product 



(f.g)ii(v) 



J2 ( A « f i'gi)i 



In our case, we consider X = Z^(i 2 (IR)). In order to obtain a Gelfand triple S C X c S' with 
the desired property, we have to construct a space of smooth test functions and its dual space 
of distributions. There are several ways to do this. The following construction is standard and 
frequently used [3]. 

Definition 2. Let 

d 2 



H = -d^ +X ' 



1 , 



and define the Hilbert space 
product 



Let H-i(M) denote the dual of 



as the completion o/Cg°(K) in L 2 (JBL) with respect to the inner 
(f,g)n{R) : = (Hf,g)L 2 (R)- 



Then with S = l\(H(M.)) and S' = l\(H-i 



we 



have a Hilbert- Schmidt embedding S C X C S'. Thus by the Bochner-Minlos theorem [3], the 
characteristic functional 



9(i) = J exp{i (f,g) ss ,)dA(g) - exp (-^<f,f) 



x 



defines a unique probability measure on S' . In addition, a nonzero expectation value can be taken 
into account by noting that 

/ exp(i(f,g + m) ss ,)dA(g) = / exp(i (f,g) ss ,) dA(g) exp(i (f, m) ss ,) 
JS' Js> 



IS 

exp 



(f,fW+i(f,m) 



Thus we have 

Lemma 2. Given m G S' , the characteristic functional 

6(i) = J exp(* (f,g) ss ,)dA(g) - exp (~(f,f)x + * (f, 
defines a unique probability measure on S'. 



m 



ss< 



(14) 



In the same way as for the measure, certain moments or conditional expectations can be inherited 
from the finite-dimensional case. 

Lemma 3. Decompose the vector-valued distribution u 6 S', the vector-valued expectation value 





















, m = 


m c 


and A = 


'Acc 


Act" 








m / 






A FF 



u 

We denote by A(up) the conditioned Gaussian measure with respect to Up, given uc- Then for 
all vector-valued test functions f = [feff] ; w ^ have the conditional expectations 

J (fc,uc) ss > dA(u F ) = (tc,nc)ss> 

J (t F ,u F ) ss , d\(u F ) = (t F ,m F ) ss , + (f , A fc A C p(u c - m c )) ss , . 

Here, f {fci u c) ss' dA(u^) and J (fp,Up) ss , dA(u^) are the weak formulations of the integrals 
J uc dA(u^) and J up dA(u^), which can be interpreted as conditional expectations of an S' -valued 
random variable with probability distribution A(up). 
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Proof. The proof follows the proof of the Bochner-Minlos theorem. We approximate the infinite- 
dimensional Gaussian measure by a sequence of finite-dimensional Gaussian measures. For each 
of these, we have a formula for the conditional expectation, given by Lemma [l] By showing that 
the limit of conditional measures exists and by showing that the monomials are measurable, we 
obtain the conditional expectations above. □ 



As in the finite dimensional case, the projection can be written in matrix form 
(f,Pu) ss , = J (f,u) 33 , d\(u F ) = (f,Eu) ss , + (f,Fm 



SS' ' 



(15) 



using the same projection matrices ( 13 ) as in the finite dimensional case. As a short notation, we 
write (151 as Pu = Eu + Fm, or P = E in the case m = 0. In the following, whenever we use 
this short notation, we always mean in the weak sense (151. 



5. Linear Optimal Prediction 



We now apply optimal prediction to a linear evolution equation under a Gaussian measure. As 
derived in Sect. [4] the conditional expectation is an affine linear projection. Here, we consider 
a Gaussian centered around the origin, thus P = E. While this choice is reasonable in many 
cases, its main purpose is to simplify notation. The results transfer to the case m ^ 0, with affinc 
linear transformations instead of matrix multiplications. We present the Mori-Zwanzig formalism 
[33l l4Tj for a linear evolution equation 



d t u = Ru , u(0) = u 



(16) 



where u is a vector-valued distribution and R is a linear differential operator (or u is a vector 
and R is a matrix, for an ordinary differential equation) that is independent of space and time. 
Consider a Gaussian measure, defined by a symmetric positive definite matrix A (see Sect. [J}. 
Let the unknowns and the corresponding operators/matrices be split 



u 



uc 


, R = 


Rcc 


Rcf 




Rfc 


Rff_ 



and A = 



Acc 



A FF 



The conditional expectation of the coordinate vector u is 

Pu = E[u|u c ] = Eu, 



where we have the projection matrix (13 1 



E 



-l 



I 

a fc a; 

Also, define F = I — E as the orthogonal projection matrix. Due to linearity, for any matrix vector 
product Bu, the projection always applies to the vector itself 

PBu = E [Bu|u c ] = B • E [u|u c ] = B E 

where the projected matrix takes the form 

g _ Bcc + BcFAreA ( 
_T5 F c + Bi?i?A^cA, 

Let the solution operator of ( 16 1 be denoted by e tR . In addition, we consider the solution operator 



u = B ■ u , 



-l 
-1 



(17) 



(18) 



vtRF 



to the orthogonal dynamics equation 



d t u = RFu , u(0) = u . 

For R being a matrix, both e tR and e tRF are in fact matrix exponentials. For R a differential 
operator, they stand as a notation for the solution operators generated by R and RF, which we 
both assume to be well posed. The existence of the orthogonal dynamics has been proved for R 
the Liouville operator to a nonlinear differential equation [27 . 
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Theorem 4 (Dyson's formula). Let R be a differential operator and E + F = I a pair of orthogonal 
projection matrices. Let e tR and e tRF denote the evolution operators generated by R respectively 
RF. Then 



JR 



JRF 



e (t'S)RF REe sR ds _ 



e (ts)RF REe sR ds 



Proof. Define the evolution operator 

M(t) = e tR - e tRF - 

Its time derivative equals 

d t M(t) = Re tR - RFe tRF - REe tR - RF [ e {t ~ s)RF REe sR ds = RFM(t) 

Jo 

With the initial conditions M(0) = 0, we obtain that M(t) = V< > 0. 



□ 



Dyson's formula together with the projection splitting I = E + F yields an identity for the solution 
operator 

d t (e tR ) = Re tR = R{E + F)e tR = REe tR + RFe tR 



REe 

Ke tR 



tR 



RFe 



tRF 



RF 



e (t~s)RF REe sR ds 



(19) 



JRF RF+ f K (t_ s ) e sR dSj 

Jo 



where 1Z = KE is the projected differential operator, and K(t) — e tRF RF HE is a memory kernel 
for the dynamics. Matrix E has zeros in the right column, thus 1Z and K have the same structure 



n 



n cc o 

Kfc 



and K - 



K CC 
K FC 



As defined by ^ , the mean solution of ( 16 1 with respect to the measure defined by ( 14 ) is obtained 
by applying the projection operator to the solution operator. Since the solution operator is linear, 
property (17 1 yields 

u m (i) = Pe tR u = e tR Eu , 
i.e. the mean solution is a particular solution, obtained by evolving the projected (averaged) initial 
conditions. Thus, the mean solution operator is e tR E. Multiplying the identity (19) from the right 
by E yields an identity for the mean solution operator 

d t (e tR E) = TZe tR E + e tRF RFE + K(t - s)e sR Eds , 

in which the middle term cancels out, since FE = 0. This yields a new evolution equation for the 
mean solution 

f <9 t u m (i) = Ku m (t) + f* K(t - s)u m (s) ds 
|u m (0) =Eu 
which reads in block-components 

8tu% = Tlccn l S + K cc * ug , ug(0) = u c (20) 
dtv$ = ftFcug + K FC * u£ , u^(0) = A FC A cc u c . (21) 



We have derived an equation ( 20 1 in which the dynamics for the variables of interest is independent 
of the evolution of the averaged variables. The latter are typically not of interest, but if desired, 
they can be obtained by integrating ( pTj ). For ordinary differential equations, the choice A = R^ 1 
(the measure is invariant under the flow) yields that IZfc — 0. However, this property has no 
particular relevance for the optimal prediction equations. If the original system is nonlinear, an 
analogous integro-differential equation can be derived, which in that context [12 is denoted second 
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order optimal prediction. It is only an approximation. For linear systems, equation (20 1 yields the 
correct mean solution, hence we call it full optimal prediction. 



The simplest approximation to (20 1, called first order optimal prediction, is obtained by neglecting 
the convolution term, i.e. by solving the system 

d t u^ = Kccu° op , u£ op (0)=u c . (22) 

A better approximation can be obtained if a time scale r exists, after which the kernel becomes 
negligible: K{t) <C K(0) Vt > r. Assuming that u m — 0(1) over the time scale of integration, a 
piecewise-constant quadrature rule yields the approximation 



f K(t- s)u m (s)ds = f K(s)u m (t- s)ds sa f 
Jo Jo Jo 



K(s)u m (t-s) ds 



f K(0)u m (t) ds = TRFR~Eu m (t) , 
Jo 



(23) 



which leads to the second order optimal prediction system 

9 t u^ oop = K cc u s c oop + r (RFRE) CC u s ° op , u^ oop (0) = u c . (24) 

Here 

(RFRE) CC — RcfRfc+RcfRffAfcA C c—RcfAfcAccRcc~RcfAfcA C qRcfAfcA C c 

is a new linear differential operator, which is second order if R is a first order operator. Obviously, 
for t < r, the integral in ( 23 1 cannot stretch over the whole length r . Hence, a better approximation 
is 



/ K(t - s)u m (s) ds « min{i, r}i?Fi?Eu m (t) , 
Jo 



(25) 



leading to replacing r by mm{t,r} in the second order system (24 1. The examples in Sect. [6] and 
[7] shall enlighten this approximation. System (24) yields a classical diffusion correction approxi- 
mation, while expression d25| leads to a new crescendo-diffusion correction approximation. 



Obviously, the above considered piecewise constant approximations to the memory term are not 
very accurate quadrature rules either, though better than not considering the memory at all. A 
possible improvement could be achieved by applying a trapezoidal approximation between t and 
t — T, which would cut the coefficient in front of the second order operator in (24 1 in half. This and 
various other approximations have been outlined in |38j , and shall be considered in more detail in 
future work. 



6. Application to a Simple Model Problem 

We apply the formalism developed in Sect.[5]to a simple model for moment systems, which allows 
an analytical verification of the derived solutions and approximations. Consider the system 

d t u(x,t) = Ru(x,t) , u(x, 0) = u(x) , 

where u = (ui,U2) T , x G M, t > 0, and the differential operator is 

The solution is u(x,t) — e tR u(x), with the solution operator 

/ A- t +At A_ t -A t \ 

e tR = e-t ( A J_ At A J +At ) , 
V 2 2 / 
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-2-1012 
u 1 



Figure 1 . Contours of the Gaussian measure for 7=1 and (3 = \ . While the 
global measure is centered around the origin, the conditioned measure for U2 is 
centered around f3u\. 



where A t u(x) — u{x — i) is the shift operator. The solution moves equally in both directions with 
velocity ±1, while decaying exponentially. Consider a Gaussian measure (14), independent of x 



and t, which is centered at the origin, defined (as in (11 )), by the positive definite matrix 



1 P 

P 7. 

We assume that the variable u\ is of interest, while is to be averaged out. While our goal is 
to derive a system for u± only, the measure enables us to encode a correlation between u\ and U2- 
For instance, we may know that for u\ given, should be close to \u\, in which case we would 
set /3 = \. On the other hand, if no correlation is known, then we would choose /3 = 0, which 
yields a decoupled measure. Figure [T] shows the contours of a Gaussian measure, corresponding 
to the case 7=1 and (3 = |. While the global measure in the two dimensional phase space is 
centered around the origin, the conditioned measure for 112 (here shown for u\ — 1) is centered 
around (3u\. 



The conditional expectation projection matrices ( 13 1 are 

and F = 



E 



-P 



The mean solution operator with respect to the measure is 

'Ma . _l Iz^a, 



e tR E 



2 A- 



1-/3 A 



(26) 



The first component of the mean solution moves in both directions with velocity ±1, while decaying 
exponentially. If (3 ^ 0, it is not split equally. Instead, there is a bias in one direction, given by the 
correlation between the variables. For the optimal prediction expressions, the projected differential 
operators 

V-7 0) - *-(-f -1. 

are of interest. The former 1Z = KE is the first order optimal prediction differential operator. The 
latter RF defines the orthogonal dynamics 



RE = 



d t u(x,t) = RFu(x,t) , 11(21,0) = u(a;) . 
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Figure 2. First solution component for t G {0.1, 1, 2, 3} with various approximations. 



Its solution u(x,t) — e tRF u(x) is given by the operator 

e~ s A Ps dsRF 

o 



„tRF 



This formula indicates that the approximation e w /, as done in (23 1, is accurate for short 



times. For the computation of the memory kernel, the combined operators 

and 



RFRF = ( 1 — (3 2 ) ( ° 



(RF) 2 KE = (1-/3 2 ) 



2\ I fidxxx ~~ d xx 
(id xx + d x 



{-@d x - I)RFRE 



are required. Integration by parts yields the identity 



<9.rx 



/+ / e- s A 0s ds(-(3d x -I) = e- t A 0t 
Jo 

Hence the memory kernel becomes 

K(t) = e tRF RFRE = (1 - /3 a )e-*A^ t Q 

The full optimal prediction evolution equation for the first component is 

dtuf{x, t) = (-1 + Pd x )u^{x, t) + (1 - /3 2 ) / e s ~ t A /3 ( t _ s < ) d xx u™(x, s) ds . 

Jo 



One can verify by differentiation that its solution is in fact the mean solution ( 26 ) 
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In more complex systems, the memory term can be difficult to compute exactly. Instead, it is 



approximated. The first order approximation (22 1 

o foop rt o foop foop 

d t u x v - f3d x ii 1 p = -u x 

is obtained by dropping the memory term. It is a scalar hyperbolic conservation law with a source 
term. Its solution Ui° p (x,t) = e~*A_^ t ui moves with velocity f3, while decaying exponentially. 
A comparison with the mean solution (26) reveals that the decay rate is represented correctly. 



However, the movement into two directions, as given by the full system, is replaced by a movement 



with an average velocity, given by the bias (3. The second order approximation (24 1 

dSOOp r>t~i SOOp SOOp 



SO 

po x u Y 



+ r(l - (3 Z )3 X 



soop 

■I 1 



adds a diffusion term to the first order approximation, which involves a characteristic time scale 
t. Although qualitatively of different nature than the original hyperbolic system, this diffusion 
yields an information propagation in both directions, a property that the first order solution was 
lacking. 

Figure [2] shows the time evolution (with (3 = \, to the initial condition u\{x) — exp(— x 6 )) of 
the first component of the mean solution (solid), first order optimal prediction (dashed), and 
second order optimal prediction (dash-dotted). In the latter, the time scale r = 1 is chosen. 
One can observe the aforementioned effects. The mean solution splits into two uneven parts. 
First order optimal prediction yields a single wave that travels with the weighted average velocity. 
Second order optimal prediction smears this wave out, thus creating information propagation in 
both directions. Note that this model problem is selected to illuminate the application of the 
formalism. It is not an example for which the considered approximations are particularly good. 



7. Application to the Radiation Moment System 



We now turn our attention to the infinite moment system ^ . For consistency with the notation de- 
veloped in Sect. [5] we denote the (infinite) vector of moments by u(x, t) — (uo(x, t), u\(x, t), . . . ) T . 
In addition, we neglect the forcing term, since it is unaffected by any truncation. The radiation 
system (|3]) can be written as 

d t u = Ru , 

where the differential operator R = — Hd x — C involves the (infinite) matrices Q. We consider 
a Gaussian measure on the space of unknowns, defined by an (infinite) matrix A. The choice of 
a Gaussian measure is motivated by the linearity of the arising projections. We are unaware of 
any physical reason why Gaussian measures should be particularly suitable for radiative transfer. 
In fact, unlike many Hamiltonian systems of ordinary differential equations (for which the Hamil- 
tonian function gives the energy and thus yields a canonical measure) , radiative transfer does not 
possess a canonical measure on its angular moments (since the energy of the system is given by the 
lowest moment only). For the here considered Gaussian measures, the question, which measure is 
a good measure, is addressed in Sect. 7.1 The question, how to approximate the memory term, 
is addressed in Sect 



7.1. First Order Optimal Prediction. We wish to truncate the system after the N-th compo- 
nent. The system and the measure are split into blocks 

"Acc 
A FC 



u 





, B = 




Bcf 


, c = 


Ccc Ccf 










Cpc Cpp_ 



and A = 



A C F 

Aff 



For the radiation system, we have 

( ° 

Bcf — 

\ N+l 

\2N+1 



Ccf = and C FC = 
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Figure 3. Spatial moments dependence tree for N = 3. 



Due to (18 1 the projected differential operator's upper left block is 

TZcc = ~ (B C c + BcfAfcAcc) d x - C cc ■ (27) 

The modification term Bcf-^fc-^-cc nas nonzero entries only in its last row. Hence, first order 
optimal prediction yields a true closure relation, since only the last equation is modified. The 
modification is jj^j times the first row of Apc-^-cC' i- e - ^ ne c l° sure depends solely on the 
correlations between the moments, given by the measure. We can see that, depending on the 
choice of A, first-order optimal prediction can generate all possible linear hyperbolic closures. 

We see that the correlation matrix between the resolved and the averaged modes Ape, which 
is given by the measure, defines the closure. The simplest choice is to prescribe no correlation 
between the two classes of modes, i.e. Ape = 0. In this case, the system is plainly cut off, and 
thus the classical P/v closure is obtained. This closure is optimal is the sense of spatial moments, 
as the following theorem states. 



Theorem 5. Among all linear hyperbolic closures, the simple Ppj -closure (Sect. 2.2) is optimal, 



in the sense that for initial conditions that satisfy J un + i(x) dx — 0, it reproduces all spatial 
moments of the radiative intensity uq up to the (N + l)-st one exactly. 



Proof. For the vector of moments with respect to the angular variable /i, we now consider its 
moments with respect to the spatial variable x. The Z-th spatial moment is 



here w.l.o.g. formulated with respect to the origin x = 0. Consider the source-free radiation system 
([7]) with no influence from boundaries (Cauchy problem, periodic solutions, or zero boundary 
values). Then each spatial moment evolves according to 

^m 1 = J x l d t udx = -B J x l d x udx C J x l udx = IBm 1 " 1 - Cm' . (28) 

Starting from the lowest spatial moments 

m°(t) = e- tc m° , (29) 



relation ( 28 1 allows to successively compute the time evolution of all higher moments m' . Of 
main interest are the spatial moments of the radiative density Uo- Since B is tridiagonal, the 
spatial moments mg, mj, . . . , rn^ +1 can by computed by a triangular structure shown in Fig. [3] 
Arrows indicate influence. Observe that the first N + 1 spatial moments depend only on the initial 
conditions of the first N + 1 angular moments u , . . . , Ujv+i- 

Now consider the system being truncated after the N-th moment, using a linear closure, as given 
by (27 1. Two modifications arise in the dependence tree in Fig.|3j First, the dependence on 7tt,^ +1 



is eliminated. Since this dependence is cut off for any linear closure, we consider only initial 
conditions for which rn° N+1 = 0, as stated in the theorem's assumption. Due to (29 1 this implies 
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m %+i(t) — Vt > 0. Second, the moment m l N may now depend on all lower moments mj, . . . , m N . 
No matter what this new dependence is, it only modifies the moments m^ +1 ~ k \/k > 0, which 
are marked bold in Fig. [3] In particular, the moments m[j, . . . , tjiq are unaffected by the closure, 
and only m,Q +1 is influenced. As can be seen in Fig. [3] the Pjy closure leaves the dependence tree 
unchanged, and thus reproduces all up to the (N + l)-st spatial moment of uq exactly. □ 



Clearly, the assumption Th^ +1 = may not be reasonable in certain cases. However, if the 
initial conditions satisfy this assumption, then the decoupled measure is optimal in the sense of 
spatial moments. Of course, in the case of a decoupled measure, the potential of the optimal 
prediction formalism is not fully used, and closures derived from coupled measures may yield 
better approximations in other terms than spatial moments. We will address this aspect in future 
work. In the following considerations, we restrict to uncoupled measures. 



7.2. Second Order Optimal Prediction. Due to Thm. [5] for the radiation system, first order 
optimal prediction yields an optimal approximation with a decoupled measure. Thus, also for 
second order optimal prediction we use a decoupled measure, although in the presence of a memory 
term, a coupled measure may have advantages. A decoupled measure has Ape = 0, thus 



RE = 



Rcc 
Rfc 



RF = 



Rcf 

Rpp 



and RFRE = 



RcfRfc 
RffRfc 



For the radiation system ^ we get 

RFRE = 

where 

BrrpB 



'BcF^Fcdxx 
Bpp'BFcdxx + CpF^Fcdx 



CF&FC 











(2N+l)(2N+3) / 



Formally, the memory term uses solution values at all previous times. However, all solution 
components (except the lowest one) decay at the rate n + a. This yields a time scale r = 
over which information from the past can be seen in the solution (r is a time scale since we have 
1) 



set c : 



Hence, the second order optimal prediction approximation (24 1 becomes 

/ 



K cc (s)u c (t - s) ds w t (RFRF) CC u c 



Q 



(N+l) 



\ 



\dxxUNf 



\ T {2N+l){2N+3) Wxx ^ 

It adds a diffusion term into the last component of the truncated system. This is the diffusion 



correction closure ^ by Levermore [3T], as outlined in Sect. 2.3 In the case N = 0, it is equivalent 
to the classical diffusion approximation. 

Although here we have assumed spatially homogeneous coefficients, we expect that the equations 
can be adapted to the space-dependent case in analogy to diffusion theory. Specifically, if k(x) 
and a(x) are space dependent, we define t(x) — j^ytp^jy, and replace rd xx UN by d x (r(x)d x UN)- 
The validity of this approximation will be addressed in future research. 



7.2.1. Crescendo-diffusion. We have re-derived diffusion and diffusion correction closures from the 
optimal prediction procedure, and we see clearly which approximations have been done. As an 
obvious example of improvement of the classical approximations, one should replace the coefficient 
r by min{r, t}, since the memory kernel cannot stretch over the full length for t < r. In other 
words, the diffusion shall ramp up gradually over time. We denote this approximation crescendo- 
diffusion (N = 0), respectively crescendo-diffusion correction (N > 0) closure. 
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Figure 4. Evolution of energy distribution for various moment closures for N = 0. 
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Figure 5. Evolution of energy distribution for various moment closures for N = 1. 
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Figure 6. Evolution of energy distribution for various moment closures for N = 3. 



The crescendo modification introduces an explicit time-dependence. The physical rationale is that 
at t = 0, the state of the system (at least the unknowns of interest) is known exactly. Information 
is lost as time evolves, due to the approximation. This implies that crescendo diffusion closures 
shall only be used if the initial conditions are known exactly or at least with higher accuracy 
than the approximation error of the system. In Sect. [8] we investigate the quality of the crescendo 
approximation numerically. 



8. Numerical Results 



8.1. One-Dimensional Test Problem. Consider the ID slab geometry system ([2| on x e]0, 1[ 
(periodic boundary conditions) with n = a = 1.5, and no source q = 0. The initial conditions are 

uq(x, 0) = exp ^—500 (x — \) 2 ^j and Uk{x, 0) = Vfc > 1. The numerical solution is obtained using 

a second order staggered grid finite difference method with resolution Ax = 1CP 3 and At = 0.8Ax. 
Diffusion and diffusion correction are implemented by two Crank-Nicolson half steps. 

The time evolution t 6 {0.1, 0.2, 0.3, 0.4} of the lowest moment uo(x, t) for various system sizes is 
shown in Fig. [3] for N — 0, Fig. [5] for N = 1, and Fig. [6] for N = 3. In each case, the solid line 
is the solution to the full radiative transfer equation, obtained using a Pjy closure with N = 51. 
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The dotted function is the classical P/v closure (Sect. 2.2 1. The dashed curve is the diffusion 
correction closure ([6]), which for N — equals the diffusion closure The dash-dotted line is 



the crescendo-diffusion correction, as derived in Sect. 7.2 



For N = the P/v closure gives a sole exponential decay of the initial conditions, which is a very 
bad approximation. As derived in Sect. |7.f | a non-diagonal measure could at best result in a 
shift of the solution to either side, which would be an even worse solution, due to the additional 
break in symmetry. The diffusion closure is significantly better than P/v, but too diffusive. The 
crescendo-diffusion remedies this problem fairly well, and yields a good approximation, especially 
for short times. Note that here the crescendo-diffusion reaches its full diffusivity at t = ^. 



For N = 1, the P/v solution splits into two peaks which move sideways. As shown in Sect. |7.1| 
their advection velocity is optimal such that the spatial variance (with respect to io = ^) of the 
correct solution is preserved (other measures would violate this property). Similarly, for N = 3, 
the P/v solution splits into four peaks, while preserving all spatial moments up to order 4 (see 
Thm.|). 

In the cases N = 1 and N = 3, the diffusion correction approximation applies a diffusion to the 
highest resolved moment ujy. Hence, qualitatively, an iV-th order diffusion correction solution 
lies "between" the Pn-i and the Pn solution. However, the classical diffusion correction is too 
diffusive in un, thus reducing its influence on the second-highest moment un-i too drastically. 
Consequently, the N = 1 diffusion correction is too close to Pq, and the N — 3 diffusion correction 
is too close to P2. In both cases, the crescendo-diffusion correction model ameliorates this problem, 
resulting in an improved approximation. 



8.2. Two-Dimensional Lattice Problem. As a more complex numerical test we consider a 2D 
checkerboard structure of different materials. The 2D sy-geometry, shown in Fig. [7] is identical 
to the example presented in [5J, however the parameters are modified here to have k — Ocm^ 1 , 
a = Q.lcmT 1 in the highly scattering regions (white in Fig. 0, and k = 10cm , <7 = Ocm 1 in 
the highly absorbing regions (gray squares in Fig. |7l. A source (hatched square in Fig. |7| q = 1 
is switched on at t — Os. The final time is t — 2s. This test case lies in an intermediate regime 
between thin and thick media, ft is purposefully chosen to not be in the diffusive regime, since in 
that regime, the already small differences in the solutions would be almost invisible. 

The benchmark solution, obtained by a P7 approximation, is shown in Fig. [8j ft is computed in 
COMSOL using finite elements with streamline diffusion. We used about 25000 bilinear elements. 
Details can be found in [36]. The solution obtained by the classical diffusion approximation is 
shown in Fig. [9j The crescendo-diffusion approximation is shown in Fig. |10| Both solutions are 
obtained by a standard second-order finite difference approximation on a rectangular grid of size 
400 x 400, and a first-order time integrator with a time step of 3 • 10 -5 . 

The crescendo solution is visibly closer to the true solution, ft shows significantly sharper features. 
The radiation front that leaves the checkerboard part is not as smeared out as in the uncorrected 
diffusion approximation. Note that the color scale is logarithmic. Thus, in the empty regions 
around the checkerboard, crescendo diffusion is roughly one order of magnitude more accurate 
than diffusion, which is a noticeable improvement. Note again that this test does not lie in the 
diffusive regime. Therefore, it cannot be expected that the diffusion approximation leads to good 
results. However, it clearly depicts that the crescendo modification greatly improves the results. 
In some applications this improvement might be sufficient. 

Computationally, the crescendo approach is a simple modification that comes at no additional 
cost. On the contrary, a speedup compared to a pure diffusion approach is observed, since time 
step restrictions due to diffusion are relaxed. 
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FIGURE 7. 2D lattice: geometry. 




Figure 9. 2D lattice: diffu- 
sion approximation. 
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Figure 8. 2D lattice: true solution. 




Figure 10. 2D 
crescendo-diffusion . 
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9. Conclusions and Outlook 

We have formulated optimal prediction for a system of linear partial differential equations with 
an underlying Gaussian measure. An identity for the evolution of a finite number of moments is 
obtained. Approximations to this identity yield different closures for a truncated version of the full 
system. We have applied the developed formalism to radiation problems, for which truncation and 
closure strategics arc fundamental challenges. While traditionally closures had been derived using 
physical arguments or by asymptotic analysis, the optimal prediction formalism yields closures by 
approximating a mathematical identity. In this fashion, we can re-derive classical linear closures, 
such as Pn, diffusion, and diffusion correction approximations. The particular approximations 
done become clearly visible. In addition, existing closures can be improved, such as the derived 
crescendo-diffusion (correction), which remedies problems of the diffusion (correction) approaches, 
while itself being a simple modification of existing approaches that comes at no additional cost. 
Other new types of closures can be derived from the presented formalism, by approximating the 
memory term in the identity to higher order. We shall do so in future work. 
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The limitations of the presented analysis lie obviously in the double- linearity. First, the equations 
are linear, second, the considered measures are Gaussian. Flux-limited diffusion [30 and minimum 
entropy closures |341 [TJ [20] yield nonlinear systems, hence they cannot be derived from Gaussian 
measures. However, it is possible that they can be derived by optimal prediction with a non- 
Gaussian measure. A first step towards non-Gaussian measures can be a perturbation analysis, 
such as in [T7] . 

Moment models for radiative transfer are examples of systems for which no straightforward mea- 
sure (on the space of moments) is given by the physics of the process (unlike many Hamiltonian 
systems). Among Gaussian measures, we have answered the question for the best measure in a 
simple case by considering first order optimal prediction only. For more general approximations, 
possibly also for non-Gaussian measures, this question is worth a deeper analysis. 
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